-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN-friendly convolution #155
Conversation
I definitely, like the overall scheme, but one organizational thought. I'm concerned about how Instead, how about leaving this |
So if I understand correctly what you are saying, the user would then import it with
This does make sense, since scipy has the same:
On a side note, it would be possible to add functions such as gaussian_filter() that would again mirror the scipy capabilities, but with support NaN values, and these could then be imported with:
Anyway, I'll implement your suggestion, but I will wait for other comments before doing so. |
boundary: str, optional | ||
A flag indicating how to handle boundaries: | ||
* None : set the ``result`` values to zero where the kernel | ||
extends eyond the edge of the array (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: eyond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@astrofrog - yep, that's what I had in mind, as long as the function |
I've implemented all the comments! Is there anything else you can think of before I extend this to more dimensions? How many dimensions should this be extended to? I'm not sure if many people would need 4D convolution and above, but 1D, 2D, and 3D certainly seem useful. |
I having a little issue with optimization. On line 71 of
literally doubles the runtime speed. Does anyone have any idea how to do this differently? Is there an efficient way to override the multiplication (or use a custom mult) function that will have the behavior NaN * 0 = 0? |
If I try a different if statement such as:
then I don't get much of a slowdown. isnan was defined as a macro with:
The example I'm using has no zeros nor nans, so in both cases the if statement always returns zero. But |
I've found a way around my previous question about optimization which seems to work fine. I've now added some basic documentation. The next step will be to add more tests, extend to 1 and 3-dimensions, and add examples to the documentation. |
I agree that 1D convolution should be the next priority, followed by 3D. I can't think of any particular reason why you'd want to implement 4D (for that matter, offhand I can't think of any common cases for 3D, but I imagine those do exist). I'm also getting 3 test failures for your current version in OS X 10.6 (py2.7.2 32-bit): http://paste.pocoo.org/show/551273 http://paste.pocoo.org/show/551274 http://paste.pocoo.org/show/551275 |
for ii in range(iimin, iimax): | ||
for jj in range(jjmin, jjmax): | ||
iii = ii % nx | ||
jjj = jj % ny |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This %
operator defaults (in cython>0.12) to checking if nx
or ny
are 0 (and raises a ZeroDivisionError if so), at a potentially significant performance penalty (http://wiki.cython.org/enhancements/compilerdirectives) - you would fix this by adding the cdivision
directive I note above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, it turns out that if I add the cdivision directive, the tests segfault and do not complete:
It seems that the tests segfault both on linux and mac, so I'm not sure what's going on. So for now, I'm leaving the directive out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran the tests right now after adding in @cython.cdivision(True)
for all four pyx files and got no segfaults (although one of the tests did fail). It's only a ~5-10% speedup to add this (for a 500x500 array w/ 3x3 kernel), though, so perhaps it's not a big deal.
@astrofrog - you may know this already, but one very useful trick for optimizing Cython code: do |
…alues get overwritten.
No, this is not ready yet - I need to fix the normalization as you suggested, and implement 1-D and 3-D convolution. I was away for the last week, but should be able to make progress on this soon. |
…allow nested lists to be passed instead of arrays.
…-dimensional convolution.
@eteq - you can now pass nested lists as input, integer values will work, and you can normalize the kernel on the fly (though default is not to). I've implemented 1-d and 3-d convolution, and I'm now going to improve the docs a little before a final review. |
This pull request is now ready for final review! |
I think it might be good to organize the documentation you added slightly differently (to match other packages), but we can address that later after the content itself is merged in. (And overall, I like the doc content a lot!) All tests pass for me, so as far as I'm concerned, you can merge when ready. |
Agree about the documentation structure - I just wanted to have something there, but it needs to be integrated better into the rest of the docs. @taldcroft, @iguananaut, @mdboom - I plan to merge this at the start of the week, in case you want to review this too. |
I'm a latecomer to this thread, but I have an alternative solution to the convolution-ignoring-nans problem implemented here: http://code.google.com/p/agpy/source/browse/trunk/AG_fft_tools/convolve_nd.py It uses FFTs (either numpy's or FFTW3's, if FFTW3 is installed) and gets around NaNs by setting them to zero, then creating a "weight" array that is 1 where the image is not nan and 0 where it is nan. The weight array is smoothed with the same kernel as the image, then divided out. In any location where the kernel does not encounter NaNs, the weight stays 1, but if there is a nearby NaN, the weight will be decreased, so the average will be over fewer pixels. The implementation posted above works in N dimensions. It could use more & better unit tests, and it would be especially good to compare directly to the "convol" approach implemented here. I think it would be good to include both implementations, but the pull request is obviously more mature in terms of astropy compliance. |
@keflavich Have you done any performance testing? I suspect this PR is faster because it's all Cython-optimized... and I'm also concerned about the memory cost of an FFT-based approach. Is there some other advantage to the FFT approach aside from allowing arbitrary dimensionality? I'm inclined to suggest that we merge this pull request now, and then @keflavich, if you want to re-write your code into an astropy-compliant form, you can issue a new PR later that either adds an n-dimensional option or replaces this one once it's ready (if there's a compelling reason to replace this one). Does that sound good? |
I was under the impression that FFTs are generally the fastest way to The memory cost is certainly an issue. Again, I like having both While cython optimization is probably pretty fast, numpy's ffts and I haven't written tests for convolution yet... I'd like to see that On Wed, Mar 21, 2012 at 4:14 PM, Erik Tollerud
Adam |
@keflavich I see your point here on the O(n log n) scaling, so I can definitely see the merits of having both available. I'm mainly concerned about the potential for confusion in having a variety of convolution options. But I think this could be easily remedied by an explicit enough naming scheme - e.g., where this pull request centers around the driver function just named Would you be fine with us merging this one as is, and have you submit a pull request later for the fft-based version (after you've got the doscstrings and code to astropy standards)? |
Absolutely, that's essentially what I intended, I just wanted to bring On Wed, Mar 21, 2012 at 6:40 PM, Erik Tollerud
Adam |
I also agree that we could add always add a I'm going to merge thie PR now since we agree that is the way to proceed. |
On the convolvefft issue, I'm kind of in favor of just having a single n-dimensional But the way I see it, they're just different strategies for doing the same thing. The advantages and disadvantages of each strategy can be explained in the docstring (just as @astrofrog's implementation is already doing for the different boundary strategies). The downside I see to this approach is that each implementation has some optional arguments that are in no way compatible with the other, which could lead to confusion. Though I tend to think that with a bit of cleaning up, and well organized documentation, this can be mitigated. |
@iguananaut - perhaps you should copy this comment into #182 now that it exists as a separate PR? |
Noticed while writing convolve_fft (#182) tests - convolve may change the kernel array (modify in-place). This happens if kernel.dtype.kind == 'f'. I think this can be solved by replacing the initial checking routines with "kernel = np.asarray(kernel,dtype='float')" as @mdboom suggested on #182. If you want to keep that behavior, it should be documented. |
@keflavich - good catch! I'll fix this. |
NaN-friendly convolution
Fix compatibility with Matplotlib 1.4 which now enforces input_dims
This pull request implements convolution functions for multi-dimensional arrays, with proper support for NaN values (which scipy's
convolve
does not have, but which is needed for Astronomical images).The current code has the following limitations:
I want to address all these points before we consider merging in the code. However, I wanted to open the pull request to get feedback on the existing code, before extending the code to 1D and 3D+, and writing more tests, since these will involve significantly more code.
The only function intended for users is:
This calls the different Cython functions depending on dimensionality and boundary treatment. The reason for implementing the four different Cython functions is that it is significantly faster than constantly checking the boundary option inside a single Cython function.
The
convolve
function's docstring should explain all the options.I decided to include this in
astropy.nddata
to mirrorscipy.ndimage
(rather than putting it inastropy.tools
. I think that is the right thing to do. I think thatconvolve
should be callable directly as above, but of course we can also add aconvolve
method forNDData
objects that then relies onnddata.convolve
.The treatment of the edges and NaN values is inspired by IDL's CONVOL function (e.g. http://star.pst.qub.ac.uk/idl/CONVOL.html).
Performance-wise, the speed is very similar to scipy's
convolve
whenboundary=None
, and slightly worse for other boundary options, but that is the price to pay for dealing with the NaN values correctly.Let me know what you think!